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ABSTRACT 

Knowledge bases of entities and relations (either constructed man- 
ually or automatically) are behind many real world search engines, 
including those at Yahoo!, Microsoft^ and Google. Those knowl- 
edge bases can be viewed as graphs with nodes representing entities 
and edges representing (primary) relationships, and various stud- 
ies have been conducted on how to leverage them to answer entity 
seeking queries. Meanwhile, in a complementary direction, analy- 
ses over the query logs have enabled researchers to identify entity 
pairs that are statistically correlated. Such entity relationships are 
then presented to search users through the "related searches" fea- 
ture in modern search engines. However, entity relationships thus 
discovered can often be "puzzling" to the users because why the 
entities are connected is often indescribable. In this paper, we pro- 
pose a novel problem called entity relationship explanation, which 
seeks to explain why a pair of entities are connected, and solve this 
challenging problem by integrating the above two complementary 
approaches, i.e., we leverage the knowledge base to "explain" the 
connections discovered between entity pairs. 

More specifically, we present REX, sl system that takes a pair of 
entities in a given knowledge base as input and efficiently identifies 
a ranked list of relationship explanations. We formally define re- 
lationship explanations and analyze their desirable properties. Fur- 
thermore, we design and implement algorithms to efficiently enu- 
merate and rank all relationship explanations based on multiple 
measures of "interestingness." We perform extensive experiments 
over real web-scale data gathered from DBpedia and a commer- 
cial search engine, demonstrating the efficiency and scalability of 
REX. We also perform user studies to corroborate the effectiveness 
of explanations generated by REX. 

1. INTRODUCTION 

Search companies have been eager to evolve beyond the ''ten 
blue links'' model and are introducing a suite of features to help on- 
line users search and explore information more effectively. Among 
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Figure 1: Related entities feature on left panel of Google (left) 
and Yahoo! (right). 

those features, one of the most intuitive is the so called related en- 
tities: when a user searches for an entity, a list of entities that are 
in some way related to the given entity are also shown to the user. 
This feature can be seen on major search engines like Google and 
Yahoo! (screenshots in Figure 1). 

However, given an entity, why certain entities are considered re- 
lated is often a mystery to the user. For example, it is difficult for 
users other than film junkies to understand why 'Tom Cruise' and 
'Brad Pitt' are related, beyond the fact that they are both popular 
actors. Informal user studies at Yahoo! indicate that augmenting 
related suggestions with concrete explanations would significantly 
increase the relevance of the suggestions and increase user engage- 
ment. Motivated by these studies, we aim to eliminate the mystery 
behind suggestions by providing relationship explanations: Given 
a pair of entities, our goal is to effectively and efficiently produce 
explanations that describe how the entities are related, based on a 
large knowledge base that maintains structured information about 
all entities^. We chose knowledge bases as the sources for expla- 
nations because of their wide spread availability behind search en- 
gines. As a very simple example of such an explanation, when 
'Nicole Kidman' is shown as related to 'Tom Cruise', we would 
like to let the users know that they used to be married. A slightly 
more sophisticated explanation arises when 'Brad Pitt' is shown 
as related to 'Tom Cruise': we would like to show that they co- 
starred in a number of movies, perhaps including example(s) of 
such movie(s), say 'Interview with the Vampires'. 

In this study, we choose to separate the explanation generation 
mechanism from the related entity selection mechanism, and focus 
on generating explanations, given a pair of entities already found 
to be related. The main motivation for decoupling explanations 
based on a knowledge graph from the reason a pair of entities was 
deemed related is that, in most search engines, the related entities 



Note that we are separating the explanation generation mechanism 
from the related entity selection mechanism, and focus on gen- 
erating explanations given a pair of entities already found to be 
related. 
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Figure 2: Example explanation for 'Tom Cruise' & 'Brad Pitt'. 
The graph pattern is on the left and one of the instances associ- 
ated with the pattern is on the right. 



generation mechanisms are not semantically meaningful. E.g., two 
entities can be considered related simply because search users often 
query them together in one session. 

Intuitively, we consider a relationship explanation as a constrained 
graph pattern and its associated graph instances derivable from the 
underlying knowledge base. Specifically, the graph pattern (similar 
to a graph query) contains variables as nodes and labeled relation- 
ships as edges, and the instances can be considered as the results 
of applying the graph pattern on the underlying knowledge base. 
One such example is shown in Figure 2 and we shall introduce the 
formal definitions in Section 2. 

The overall process of relationship explanation consists of two 
main steps: (1) Explanation Enumeration: Given two entities, the 
starting one (i.e., the one user searched for) and the ending one (i.e., 
the one being suggested by the search engine), identify a list of can- 
didate explanations; (2) Explanation Ranking: Rank the candidate 
explanations based on a set of measures to identify the most inter- 
esting explanations to be returned to the user. Both steps involve 
significant semantic and algorithmic challenges. First, since the 
knowledge base typically contains several million nodes, efficiently 
enumerating candidate explanations is an arduous task. Second, 
explanation ranking involves two significant challenges: defining 
suitable measures that can effectively capture explanations' inter- 
estingness and computing those measures for a large number of 
explanations in almost real time. Finally, we also seek opportuni- 
ties to perform aggressive pruning when combining enumeration 
and ranking. 

It is worth noting that there are quite a few existing works on 
mining connecting structures from graphs, such as keyword search 
in relational and semi- structured databases [1, 2, 3, 5, 17, 12, 13, 
14, 21, 24, 29, 15] and graph mining [8, 10, 18, 22, 25]. The key 
differentiating contribution of REX is to consider connection struc- 
tures that are more complex than trees and paths for explaining two 
entities, and introduce two novel families of pattern level interest- 
ingness measures. 

To the best of our knowledge, this is the first work addressing 
and formalizing the problem of generating relationship explana- 
tions for a pair of entities. We make the following main contribu- 
tions: First, we formally define the notion of relationship explana- 
tion and carefully analyze the properties of desirable explanations 
(Section 2). Second, we design and implement efficient algorithms 
for enumerating candidate explanations (Section 3). Third, we pro- 
pose different interestingness measures for ranking relationship ex- 
planations, and design and implement efficient algorithms for rank- 
ing explanations efficiently (Section 4). Finally, we perform user 
studies and extensive experiments to demonstrate the effectiveness 
and efficiency of our algorithms (Section 5). 

2. FUNDAMENTALS 

In this section, we formally introduce the relationship explana- 
tion problem. We start by describing the input knowledge base 
(Section 2.1) from which the relationship explanations are gener- 
ated. In Section 2.2, we introduce the formal definition for relation- 




Figure 3: A subset of the entertainment knowledge base. 

ship explanation, which is composed of two essential components: 
relationship explanation pattern and relationship explanation in- 
stances. In Section 2.3, we describe important properties of rela- 
tionship explanations in terms of the graph structure. The subset of 
relationship explanations that best satisfy the desired properties are 
called minimal explanations and are explored in the remaining of 
our study. 

2.1 Knowledge Base 

As motivated in Section 1, we choose to construct explanations 
from an input knowledge base, which is formally represented as a 
graph that consists of entities (e.g., persons, movies, etc.) as nodes, 
and primary relationships between entities (e.g., starring, spouse, 
etc.) as edges^. Entities have unique IDs (e.g., brad pitt)"^ and edges 
can be either directed (e.g., starring) or undirected (e.g., spouse). 
Therefore a knowledge base can be represented as a three-tuple 
G = {V,E,X), where V is the set of nodes, E is the set of edges, 
and A = £^ ^ E is the edge labeling function. 

Figure 3 illustrates a simple running example, which is a sub- 
set of the entertainment knowledge base behind the Yahoo ! search 
engine (the actual knowledge base contains 200K nodes and over 
IM edges extracted from DBPedia). The primary relationships are 
represented as solid lines with arrows (directed relationships) or 
without arrows (undirected relationships). 

2.2 Relationship Explanation 

Intuitively, a relationship explanation is a constrained graph pat- 
tern along with its associated instances that are derivable from the 
knowledge base. We use the terms relationship explanation pattern 
and relationship explanation instance to describe the two compo- 
nents respectively. The existence of a relationship explanation pat- 
tern is independent of the knowledge base. However, an explana- 
tion pattern is only meaningful if its associated relationship expla- 
nation instances can be found in the knowledge base with respect 
to the given entity pair. More concretely, the relationship expla- 
nation pattern is modeled as a graph structure that connects two 
target nodes representing the given entity pair. Edges in the struc- 
ture have constant labels and the remaining nodes in the structure 
are variables: 

Definition 1 (Relationship Explanation Pattern). 
A relationship explanation pattern can be represented as a 5-tuple, 
p — (y, E^ A, V starts Vend), wkcrc V is the set of node variables, 
with two special variables Vstart and Vend, E is a multiset of 
edges, and X = E ^ ^ is the edge labeling function. 

Relationship explanation instances, on the other hand, capture 
the actual data instances from the knowledge base and are used to 
support an explanation pattern. Intuitively, given the knowledge 
base C, a pair of related entities that map to two nodes Vstart and 
Vend in G, and an explanation pattern p, explanation instances for 

^ We use the term primary relationships to distinguish them from the 
derived relationships that REX will infer during the construction 
of the explanations. 

"^In practice, the IDs are system generated, but for the simplicity of 
discussion, we adopt readable titles/names as the IDs. 
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(a) Spouse (d) Collaborating with same director 

(brad pitt, angelina jolie) (brad pitt, angelina jolie) 



(a) Non-essential pattern 
(brad pitt, angelina jolie) 



(b) Decomposable pattern 
(brad pitt, angelina jolie) 




(b) Co-starring (c) Co-starring & producing 

(brad pitt, angelina jolie) (brad pitt, julia roberts) 



Figure 4: Example explanation patterns. 

p can be defined based on mappings from p to G, identifying the 
subgraphs of G that satisfy the explanation pattern. 

Definition 2 (Relationship Explanation Instance). 
Given the knowledge base G = {V^E^X), an explanation pat- 
tern p = {V',E',\' 

1 start i'^ end)' target nodes 

V start , Vend ^ V, cin explanation instance of p, denoted as 
i{p,G,v start, Vend), or ip, is a mapping f \ V' ^ V, where 
v'start is mapped to Vstart, v'^n^ is mapped to Vend cind nodes in 
y - {v'start, v'^^^} are mapped into V - {v start, Vend}- Edge 
constraints must be satisfied: \fe' — (v'i^V2) G E' there must be 
an edge {f{v[), f{v2)) with label \' [e) in G. The set of all p's 
instances are denoted as I{p, G, Vstart, Vend), or Ip. 

For a pair of entities v start and Vend, a relationship explanation 
is defined as the pair (p, Ip) consisting of the explanation pattern p 
and the explanation instances Ip, where \Ip\ > 0. 

Example 1. Figure 4 illustrates some relationship explana- 
tion patterns that have at least one instance from our entertainment 
knowledge base between 'Brad Pitt' and 'Angelina Jolie' or 'Julia 
Roberts'. In particular, Figure 4(a) shows a most simple spouse 
relationship pattern. Figure 4(b) shows the co-starring relationship 
pattern, i.e., both 'Brad Pitt' and 'Angelina Jolie' starred together 
in one or more movies (which are collectively represented as the 
variable node vq). Figures 4(c) and 4(d) illustrate more compli- 
cated relationship explanation patterns: the former adds the pro- 
ducing relationship between 'Brad Pitt' and the movie variable vq 
to produce an explanation pattern slightly more complicated than 
co-starring, while the latter introduces one additional movie vari- 
able (v2) and one director variable (vi) to form the "collaborating 
with same director" explanation pattern. 

2.3 Properties of Explanations 

Definitions 1 and 2 allow a very large space of possible expla- 
nations, some of which may not be semantically meaningful. This 
prompted us to identify desirable structural properties of the expla- 
nations, which are described below. We note that since the struc- 
tures of the instances are enforced by their corresponding patterns, 
we discuss the structural properties in terms of the patterns. Later, 
in Section 4, we describe how instances are critical in determining 
the interestingness of the explanations. 

Essentiality 

We want to capture the desideratum that explanation patterns con- 
tain only the "essential" nodes or edges, i.e., all nodes and edges 



Figure 5: Example non-minimal explanation patterns. 

should be integral to the connection between the target nodes. In 
the definition below, we give a syntactic characterization based on 
the graph structure of the explanation pattern. 

Definition 3 (Essentiality). A node v (or an edge e) in 
an explanation pattern p — {V^E^X^v start, Vend) is essential if 
there is a simple path (i.e., without repeating nodes or edges, and 
considering edges as undirected) through v (or e) from v start to 
Vend- P is said to be essential if all of its nodes and edges are 
essential. 

Example 2. Figure 5(a) shows a structure that is not essen- 
tial: the node vi and the edge (vi , vq) are not essential since they 
are not on any simple path from Vstart to Vend- 

Non-essential nodes and edges can be meaningful. For example, 
in Figure 5(a), vi provides information about the director for the 
movie node vq, which can be interesting to users. In essence, this 
is akin to putting attribute constraints on the essential nodes. How- 
ever, the space of non-essential graphs is extremely huge since they 
can be arbitrary graphs. As a result, in this paper, we will only con- 
sider explanation patterns that are essential. Non-essential nodes 
and edges as well as attribute constraints on essential nodes can be 
added in a separate stage when a candidate set of most interesting 
essential patterns are generated, and the details of this extension are 
beyond the scope of the current study. 

Non-decomposability 

The next desideratum is that we should not be able to "decompose" 
an explanation pattern into an equivalent set of smaller explanation 
patterns. From an intuitive semantic perspective, given an explana- 
tion pattern p — (y, ^, A, v start , Vend), p is decomposable if there 
exist two explanation patterns, pi = (V^i , , Ai , vlstart , vlend) 
and p2 = (V2, E2,X2,v2 start, v2 end), such that Vl , V2 C V, 
and for all knowledge base instances and entity pairs, we have: 
(/pi 7^ A 7^ 0) ^ /p / 0. In another word, whenever 
the "sub-patterns" have some instances, then the entire pattern also 
must have an instance for decomposable patterns. The following is 
a formal definition that syntactically characterizes decomposability 
using the graph structure of explanation patterns. 

Definition 4 (Decomposability). An explanation pat- 
tern p = {V, E, A, Vstart, Vend) is dccomposablc if there exists a 
partition ofE into Ei , E2 such that fiv G V — {v start. Vend} such 
that V is an endpoint of an edge ei G Ei as well as an endpoint 
of an edge 62 ^ E2. p is said to be non-decomposable if it is not 
decomposable. 

Example 3. The explanation pattern in Figure 5(b) can 
be decomposed into two disjoint explanation patterns 4(a) and 
4(b). The edge partitions of {{v start, spouse^ Vend)} and 
{{vstart, starring^ vq) , {vend, starring^ vq)} do not share any 
nodes (besides the two target nodes). 

We combine the properties of essentiality and decomposability to 
denote the notion of minimality: An explanation pattern is said to 
be minimal if it is essential and non-decomposable. An explanation 
is said to be minimal if its explanation pattern is minimal. 
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3. EXPLANATION ENUMERATION 

In this section, we study how to efficiently enumerate minimal 
explanations upto a limited size n (provided as a system parameter) 
for a given node pair Vstart and Vend in the knowledge base G. 

One naive approach is to take advantage of existing graph enu- 
meration algorithms [26] to generate all graph patterns and filter 
out the patterns that are either non-minimal or with no instances. 
We call this naive algorithm NaiveEnum, which is illustrated in Al- 
gorithm 1, and use it as the baseline in our experiments. During 
the enumeration, any pattern that is either duplicated (i.e., isomor- 
phism [11] to a pattern discovered earlier) or with no instance will 
be pruned immediately. If the pattern is minimal, then we add it 
(and its instances) to the result explanation queue Q. However, 
minimality is not a pruning condition in NaiveEnum since non- 
minimal graph patterns could later be expanded to minimal graph 
patterns under the graph expansion rule of [26]. Not surprisingly, 
NaiveEnum is inefficient since it generates a lot of non-minimal 
explanation patterns and requires explicit minimality check. 



Algorithm 1 NaiveEnum(G',^;start,'^^end,n):Q 
1: Q = 0,Qp = 

2: Append a seed pattern (a graph with a single start node) to Qp 
3: i = 

4: while i < length of Qp do 

5: Qp = expand(Qp[i]) (Following the graph expansion rules in the 
graph enumeration algorithm gSpan[26], and recording the start and 
end node) 

6: for peQ'p do 

7: Ip = instances of pin G with respect to v start and Vend (can be 

computed efficiently from Qp[i]'s instances and G) 
8: if ^> is not duplicated n > n < n then 
9: Append;? to 

10: if 7? is minimal then 

1 1 : Append the explanation re = (p, Ip) to Q 

12: end if 

13: end if 

14: end for 

15: i = i + i 

16: end while 

17: return Q 



3.1 Explanation Enumeration Framework 

Our goal is to design explanation enumeration algorithms that 
directly generate all and only minimal explanations with at least 
one instance in the knowledge base. The intuition of our algorithm 
comes from the observation that any minimal explanation pattern 
is covered by a set of path patterns, which is enforced by the essen- 
tiality property in Section 2.3, stating that each node and edge in a 
minimal explanation pattern must be on a single path between two 
target nodes. We call the set of path patterns that cover a minimal 
explanation pattern the covering path pattern set of the explanation 
pattern: 

Definitions (Covering Path Pattern Set). Given a 
minimal explanation pattern po — {V, E, X,v start, Vend), 
say that a multiset of path patterns S — {pi,P2, -..jPrn} is a 
covering path pattern set if the set of path patterns in S cover all 
the edges and nodes in po; i.e., (1) each pi (1 < i < m) maps to a 
simple path between v start and Vend through edges in E, and (2) 
every node in V and every edge in E appears in at least one pi 
(l<i< m). 

Theorem 1 . Each minimal explanation pattern must have at 
least one covering path pattern set. 




v2 -> revolutionary road v2 -> revolutionary road 
(a) (b) (c) 

Figure 6: Example Minimal Explanations for Kate Winslet and 
Leonardo Dicarprio 

Proofs for the theorems are omitted due to space constraints. 
Some minimal explanation patterns might have multiple covering 
path pattern sets. We also observe that we can compute the in- 
stances of a minimal explanation pattern from the instances of the 
path patterns in its covering path pattern set, instead of evaluating 
against the knowledge from scratch. 

Example 4. The minimal explanation pattern po in Figure 6(a) 
has a covering path pattern set containing the path patterns pi in 
Figure 6(b) and p2 in Figure 6(c). Similarly, the instance ii ofpo 
can be computed from the instance 12 ofpi and the instance ii of 
P2. 

Theorem 1 suggests a general framework for minimal explana- 
tion enumeration: (1) Enumerate all path explanation patterns, in- 
cluding their associated instances; (2) Generate all the minimal ex- 
planation patterns (and their instances) by combining the path ex- 
planation patterns (and their instances). We only need to do explicit 
instance evaluation for the path explanations since instances of all 
other minimal explanations can be computed from them. When a 
pattern size limit n (i.e., the number of nodes in the pattern) for 
a minimal explanation pattern is specified, we can derive a corre- 
sponding path pattern length limit / for the covering path patterns 
as / = n — 1. 



Algorithm 2 GeneralEnumFramework(G',Vstart,Vend,^):Q 

1' Qpath = PathEnum(G, Vstart.Vend, n-1) 
2: Q = PathUnion(Qpath,n) 
3 : return Q 



The general enumeration framework is shown in Algorithm 2. 
It takes C, v start. Vend and a pattern size limit n as input, and 
returns all minimal explanation with size up to n. In particular, 
pathEnum enumerates over simple path explanations (including 
the patterns and associated instances) for Vstart and Vend (Sec- 
tion 3.2), with path pattern length up to n — 1; all path instances are 
directly extracted from the knowledge base G. pathUnion com- 
bines those simple path explanations into the minimal explanations 
(Section 3.3). 

3.2 Path Explanation Enumeration 

Path explanation enumeration takes Vstart and Vend as input, a 
length limit / and the knowledge base G as parameters, and returns 
Qpath — the set of all path patterns for Vstart and Vend with lengths 
up to / (and their instances). Since path explanation enumeration 
can be viewed as a special case of keyword search in databases [1, 
2, 3, 5, 17, 12, 13, 14, 21, 24, 29, 15] when the queried keywords 
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match exactly two tuples, we adapt our algorithms from existing so- 
lutions instead of inventing new algorithms. There are two typical 
paradigms in performing keyword search in databases: (1) viewing 
databases as a tuple graph (tuples and their attribute values are con- 
sidered as nodes and key/foreign key relationships are considered 
as edges) and directly searching for the instance level connecting 
structures[5, 17, 3, 12]; (2) first enumerating the schema level con- 
necting structure (usually called candidate networks, akin to our 
path pattens here) and then evaluating the candidate networks to 
find out all the instances [1, 2, 13, 14, 21, 24, 29, 15]. We describe 
our algorithms of path enumeration following the first paradigm, 
since the knowledge base is already represented as a graph. Once 
all path instances are generated, we group them into path patterns 
by simply changing the nodes in the path instances to variables, a 
relatively straightforward process. However, algorithms and intu- 
itions from both lines of work can be adapted into our framework. 

The first path enumeration algorithm PathEnumBasic is adapted 
from BANKS [5]. BANKS runs concurrent single source shortest 
path algorithms from each source node and finds the root node 
connecting a set of source nodes that describe all the keywords. 
We apply a similar strategy to generate partial paths from both tar- 
get nodes Vstart and Vend concurrently. We generate all the path 
instances limited by length [Z/2] starting from v start and all the 
path instances limited by length [l/2\ starting from Vend, with 
shorter paths being generated first. Two path instances ii and i2 
from opposite directions can be connected to generate a full path 
instance if they end at a common node. Although this algorithm 
is adapted from BANKS, the same intuition also comes from Dis- 
cover[14] if we are considering pattern level path enumeration: in 
the candidate network evaluation step of Discover, the optimizer 
iteratively chooses the most frequent (shared by most other can- 
didate networks) "small" (number of instances is restricted by the 
input keywords) relations to evaluate. In our setting, this is equiv- 
alent to iteratively evaluate the shortest unevaluated path patterns 
connecting to any target node. 

The second path enumeration algorithm PathEnumPrioritized is 
again a direct adaption from BANKS! [17], an improved version 
of BANKS. When generating paths from both target nodes, instead 
of always expanding the shortest partial paths, an activation score 
is used to prioritize the expanding. The activation score captures 
the following intuition: if expansion from one target node reaches 
a node with large degree, it might be very expensive to do further 
expansion; instead, waiting for the expansion from the other target 
node might be less expensive. The activation score is defined as 
follows: Initially, the activation score of each target node is set to 
1 divided by its degree. Each time the algorithm picks a node with 
largest activation score to expand the paths ending at that node. 
During the expansion, activation score of the node spread to its 
none-target node neighbors (the activation score spread to each new 
node is set to the activation score of the original node divided by 
the degree of new node) and the activation score of original node 
is set to 0. For each none-target node, activation scores provided 
by different neighbors are added up. If a node receives activation 
scores from both target nodes, it indicates the identification of new 
connecting paths. Again, if our algorithm was adapted from can- 
didate network generation and enumeration, the intuition for the 
same strategies comes from Discover[14] when we assume 6 > 
in the cost model (i.e., we take into consideration the estimated size 
of join results) for candidate network evaluation. 

3.3 Path Explanation Combination 

Path explanation combination takes the length-limited path ex- 
planations Qpath as input, the pattern size limit n as parameter. 



and return Q — the set of all minimal explanations with limited pat- 
tern size. Combining path explanations to generate minimal expla- 
nations is a non-trivial task. Any set of path explanation patterns 
could be a covering path pattern set for some minimal explanation 
patterns and there are many ways of combining path patterns in a 
covering path pattern set. In order to have a better understanding 
of how we can generate all the minimal explanation patterns (and 
hence the explanations), we partition the set of all minimal expla- 
nation patterns MinP into disjoint sets, depending on the minimal 
cardinaHty (number of path patterns) of any covering path pattern 
set of a minimal explanation pattern: 

MinP = {MinP{k), k = I..00}, (1) 

where MinP{k) represents the set of minimal explanation patterns 
with minimal covering path pattern set cardinality of k. In partic- 
ular, MinP{l) represents all path patterns. We can extend the 
notion of covering path pattern set to include non-path minimal 
patterns: 

Definition 6 (Covering Pattern Set). Given a mini- 
mal explanation pattern po — {V, A, v starts Vend), say that 
a multiset of patterns S = {pi,P2, ...,Pm} is a covering pattern 
set if the set of patterns in S cover all the edges and nodes in po; 
i.e., (1) each pi (1 < i < m) maps to a sub-component ofpo con- 
necting Vstart and Vend through cdgcs in E, and (2) every node in 
V and every edge in E appears in at least one Pi (1 < i < m). 

Just like covering path pattern set, given a knowledge base, the 
instances of a minimal pattern can be computed from instances 
of patterns in its cover pattern set. The following theorem shows 
that MinP{k), k > 1 can be derived from minimal patterns with 
smaller cardinality: 

Theorem 2. Each explanation pattern in MinP{k) {k > 
1) must have a covering pattern set composed of a pattern in 
MinP{k — 1) and a pattern in MinP(l). 

Theorem 2 suggests that starting from MinP{l), we could it- 
eratively enumerate MinP{k),k > 1 from MinP{k — 1) and 
MinP{l). Our first path explanation combination algorithm PathU- 
nionBasic (Section 3.3.1) directly apphes this finding to reduce the 
enumeration space. In Section 3.3.2 we discuss additional pruning 
opportunities for PathUnionBasic and propose an even more effi- 
cient combination algorithm PathUnionPrune. 

3. 3. 1 Path UnionBasic 

Algorithm 3 illustrates the pseudocode for PathUnionBasic, and 
we explain its critical components as follows: 

Enumeration (Line 1 - Line 15): Path explanations in Qpath 
are used as the seed explanations and put in an explanation queue 
Q. For each explanation re in Q, the algorithm combines re with 
each path explanation in Qpath to generate new minimal explana- 
tions. The Explanation Merging component ensures that the gener- 
ated explanation patterns are minimal and each is associated with 
at least 1 instance. The Duplication Checking component ensures 
that only unique explanations are appended to Q (i.e., duplicates 
are pruned). The process stops when all explanations in Q have 
been expanded and no more explanations can be generated. All 
the minimal explanations with limited pattern size are guaranteed 
to be in Q at the end of the process. (Proof omitted due to space 
constraints.) 

Explanation Merging (Line 24 - Line 41): To define the merge 
of two explanations, we consider a partial one-to-one mapping be- 
tween the patterns of two explanations, say pi = (Vi,£^i, Ai, 
V^start^ vlend) and^)2 = (^2,^2, A2 , v2 

start 1 v2e„d): 
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Algorithm 3 PathUnionBasic(Qpat^,''^)^Q 

!• Q ~ Qpath^ Q expand ~ Qpath 

2: while Qexpand / do 

3^ Qnew — 

4: for all (rei , re2) pair m Qexpand x Qpat^ do 

5: Qtemp = merge(rei , re2 , n) 

6: for re G Qtemp do 

7: if duplicated{re, Q U Qnew) = False then 

8: Append re to Qnew 

9: end if 

10: end for 

1 1 : end for 

12: Append Qnew to Q 

13. Qexpand — Qnew 

14: end while 

15: return Q 

16: function duplicated(re,Q):fiwp/icatefi 

17: for rei G Q do 

18: if exist an ismorphism between re's pattern and rei 's pattenrn then 

19: return True 

20: end if 

21: end for 

22: return False 

23: end function 

24: function merge(rei,re2,'n'):Qnet(; 

25 : (pi, Ipi) = rei 's pattern and instances 

26: (j>2, Ip2 ) = ^^2 's pattern and instances 

27. Qnew — 

28: for all partial one-to-one mapping / from pi.V iop2.V do 

29: Pnew = Pi Uf P2 

30: Ip^,^ = 

31: for all (ii, 22) pair in Ip-^ x Ip^ do 

32: if ii, 12 is the same on every pair of matched nodes then 

33: Append inew = n U/ Z2 to Ip^^^ 

34: end if 

35: end for 

36: if |^»net(;-^| < n and \Ipnew I > then 

37: Append renew = (pnew , ^Pne«; ) to Qnei/; 

38: end if 
39: end for 

40: return Qneto 
41: end function 



(1) vlstart and vlend should be mapped to v2start and v2end 
respectively. 

(2) A non-target node vi G Vi- {vlstart, vlend} of pi could 
be mapped to a non-target node V2 ^ V2— {v2start, v2end} of p2 
or does not map to any node. (Same restriction for V2) 

(3) One-to-one mapping is enforced (when there is a mapping). 

(4) At least one non-target node of pi should be mapped to a 
non-target node of p2 . 

Given this partial one-to-one mapping function /, a new expla- 
nation pattern can be merged from pi and p2 following the map- 
ping function /. We use an operator U/ to represent this merg- 
ing: nodes and edges in both patterns should be put into the new 
pattern, with each pair of matched nodes merged as one node. If 
there are multiple edges with same label between a pair of nodes 
in the new pattern, they are merged as well. Since each node and 
edge of the new pattern are coming from two minimal explana- 
tion patterns, it is guaranteed to be on a single path between target 
nodes. Therefore the new pattern is essential. On the other hand, 
requirement (4) of the mapping guarantees that the new pattern is 
also non-decomposable. Therefore the new explanation pattern is 
minimal. The instances of the new explanation can be generated by 
enforcing the same mapping on each pair of instances from rei and 
re2, with the requirement that two instances agree on every pair of 



matched nodes. The new explanation is kept only if it has at least 
one instance. 

Example 5. Consider the two patterns pi in Figure 6(b) and 
P2 in Figure 6(c). A valid partial one-to-one mapping between the 
two patterns is pi.v{start) — p2-v{start), pi.v{end) — p2-v{end), 
nothing — p2-vl, pi.v2 — p2-v2. Combining pi and p2 following 
the mapping yields the pattern po. po's instance ii can be com- 
puted from 12 ofpi and ii of p2 following the mapping. 

Duplication Checking (Line 16 - 23): An explanation could be 
generated multiple times during the enumeration (e.g., combina- 
tion of different pairs of minimal explanations could yield the same 
minimal explanation). We perform duplication check for a new ex- 
planation by checking graph isomorphism [11] of its explanation 
pattern against patterns of any existing explanations. If a graph iso- 
morphism is detected, then the new explanation is duplicated and 
therefore ignored. 

3.3.2 PathUnion with Pruning 

PatkUnionBasic generates all but only the minimal explanations 
with at least 1 instance. Therefore it is much more efficient than the 
baseline algorithm. However, since a minimal explanation might be 
generated multiple times during the enumeration (indicating some 
of the combinations might be unnecessary), the efficiency of the al- 
gorithm is still restricted by the number of times we need to merge 
the minimal explanations. The following theorem allows us to de- 
crease the number of merges required: 

Theorem 3. Each explanation pattern in MinP{k), {k > 2) 
must have a covering pattern set {pi , po } of size 2, such that po , pi 
G MinP{k — 1), andpo and pi share a MinP{k — 2) subcom- 
ponent p2. i.e., the pattern graph ofp2 is a subgraph of patterns of 
Po and pi, and start and end node ofp2 map to start and end node 
ofpo andpi. 

Another way to interpret this theorem is that: Let pi G MinP(k) 
{k > 2), p2 G MinP{k - 1) and G MinP(l). In order to 
generate pi from p2 and ps, there must be and p^ that satisfy 
following conditions: G MinP(k — 1); p4 G MinP{k — 2) 
and is a subcomponent of p2', P3 can be merged from p^ and p^. 
Based on this interpretation, we can reduce the number of times we 
need to combine a minimal explanation with a path explanation. 
Specifically, during the enumeration, for each explanation in Q that 
has its pattern p2 in in MinP{k — 1), we record the pairs of p4 G 
MinP{k — 2) and G MinP{l) (and hence the corresponding 
explanations) it was generated from. For an explanation with its 
pattern p2 G MinP{k — 1), by comparing the composition history 
with other explanations that have patterns in MinP(k — 1) and 
enforcing the requirement from Theorem 3, we can decide whether 
the subset of paths should be merged with p2. The pseudocode 
of the enumeration algorithm with pruning is in Algorithm 4 and 
we call this algorithm PathUnionPrune. We use queues H expand 
and Hnew to store the composition history for MinP(k — 1) and 
MinP{kys corresponding explanations respectively. 

4. INTERESTINGNESS MEASURES 
AND EXPLANATION RANKING 

When the number of minimal explanations is larger than what we 
can expect users to consume, it is important to rank them in order of 
their "interestingness." This interestingness measure can be defined 
in a variety of different ways and is often subjective. In this paper, 
we aim to present a comprehensive set of such measures and design 
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Algorithm 4 PathUnionPrune((5pat^,''^)^Q 



1 
2 
3 
4 
5 
6 
7 
8 
9 

10: 

11 
12 
13 
14 
15 
16 
17 
18 
19 
20 
21 
22 
23 
24 
25 
26 
27 
28 
29 
30 
31 
32 



Q — Qpathi Q expand — Qpath 

while Qexpand / do 

Qnew — 05 Hyiew — 

for all ii m [0 .. length(Q expand) 

Spath = 

Qexpand — Qpath thcil 

Spath = [0 .. length(Qpath) 
else 

for all ^2 in [0 .. length{Qexpa 



l]do 



nd) 

pair 



for all j2)) 

Hexpandli2] dO 

Addj2 toSpath 
end for 
end for 
end if 

for all 22 m Spath do 

temp 

for do 

if duplicated(re, Q) = False then 

if duplicated{re, Qnew) = False then 

Append re to Qne-u; 

Append to Hnew 
end if 

ire = re's index in Qnew 

Append (n , 12) to i^ne^i; [Ve] 

end if 
end for 
end for 
end for 

Append Qnew to Q 

Q expand ~ Qnew \ H expand ~ Hnew 

end while 

return Q 



l]do 

in Hexpand[^l\ ^ 



efficient algorithms for computing them. In Section 5, we conduct 
user studies to analyze the effectiveness of our proposed measures. 

We start by formally defining a generic interestingness measure. 
We pay particular attention to one of the key properties of a mea- 
sure, namely monotonicity . We shall see that anti-monotonicity, 
which holds for some of our measures, can be used for pruning in 
enumeration and ranking of explanations. 

Definition 7 (Measure and Monotonicity). 
An interestingness measure M is a function that takes 
as input the knowledge base G = {V, E,X), an expla- 
nation pattern p = iV'^ E\\\v'start^'^'end)y 
get nodes Vstart,Vend ^ G.V and returns a number 

M{G,p,V start, Vend) G M. 

We say that a measure M is monotonia (anti-monotonic, resp.) 

ifandonlyifM{G,pi = E[, X[,vitart,vLd)^ V start, Vend) 

>(<, resp.) M{G,P2 = {V2, E2, X2,vUart,V end), start, Vend) 

whenever the graph G2 induced by V2, E2, X2 is a subgraph ofGi 
induced by V{ , E[ , X[ . 

Note that although an interestingness measure is defined in terms 
of an explanation pattern, by including the knowledge base as one 
of the inputs to the measure function, the corresponding instances 
can also be derived. Therefore, an interestingness measure actually 
measures the interestingness of explanations. 

Most existing measures for connecting structures is derived from 
their topological structures; examples of them include the size mea- 
sure and random walk measure, which we will discuss in Sec- 
tion 4.1. However, these measures do not capture the aggregated 
information of the instances, e.g., co-starred in 10 movies. There- 
fore, we propose two novel families of interestingness measures: 
aggregate measures and distributional measures. Aggregate mea- 



sures are obtained by aggregating over individual instances. One 
intuitive aggregate measure is the count measure, where the inter- 
estingness of an explanation is proportional to the number of ex- 
planation instances obtained by applying the explanation pattern to 
the knowledge base. We can compare simple aggregate measures 
against those of other pairs of entities to produce distributional 
measures. We describe aggregate and distributional measures in 
Sections 4.2 and 4.3 respectively. 

4.1 Structure-based measures 

The structure of an explanation pattern can affect the interest- 
ingness of an explanation. These kinds of interestingness measures 
are frequently used in existing works [1, 2, 3, 5, 17, 12, 13, 14, 
21, 15, 8, 10, 18, 22, 25]. We describe two representatives in this 
section: the size measure and the random walk measure. Size of 
pattern is a simple but useful summarization of the structural in- 
terestingness, and it can be easily used together with any other in- 
terestingness measure. Another structural interestingness measure 
we consider is based on an extension of the random walk process 
described in [10]: each connecting instance graph is regarded as 
an electrical network (e.g. each edge represents a resistor) and the 
amount of current delivered from the start entity to the end entity 
is used as the interestingness of the connecting graph. In our case, 
we apply the random walk on the pattern and use the result as the 
interestingness measure for the explanation. 

4.2 Aggregate Measures 

Aggregate measures follow the intuition that the more instances 
an explanation has, the more interesting it is. For example, con- 
sider the explanation in Figure 4(b) (co-starring): the more movie 
instances vq can map to, the higher the aggregate measure is, and 
the more interesting the explanation is. We distinguish two ways of 
aggregating the number of instances: count and monocount. 

Count 

The count measure simply gives the total number of distinct in- 
stances an explanation has. Formally, we have: 

M count , p, V start 5 Vend 

) = |{/|/ satisfies Definition 2}| 

While intuitive to define, M count is neither monotonic nor anti- 
monotonic [4], which makes it difficult to compute due to the lack 
of pruning possibilities. 

Monocount 

To address the shortcoming of M count, we propose an alternative 
count measure that has the anti-monotonicity property. Given G, 
p = {V\ E\ X, Vgtart, v'^^^) and the target nodes, let uniq{v), 
V ^V' denote the number of distinct assignments that can be made 
to any variable over all instances: 

uniq{v) = \{f{v)\f satisfies Definition 2}| 

The monocount of p gives the fewest number of assignments over 
all variables (except the two target nodes): 

Mmonocount {G, p, V start , Vend) = Hlin Uniq(v) 

'^^P-^'-iKtartKud^ 

We override the above formula and define monocount to be 1 in the 
special case that there is a direct edge between the target entities. 

Example 6. Let us assume that in Figure 6(a), there is an- 
other instance with v\ mapping to "sam mendes" and V2 mapping 
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to "revolutionary road 11". Then in this case \uniq{vi)\ = 1 and 
\uniq{v2)\ — % therefore minvi(\uniq(vi)\) — 1 and the mono- 
count is 1. In comparison, the count would be 2 in this case. 

Note that when there is a single non-target variable, Mmonocount 
= M count- Our measure is an extension of the anti-monotonic sup- 
port of sub-graphs within a single graph that was introduced in [6]. 

4.3 Distribution-Based Measures 

Aggregate measures are suitable for comparing explanations for 
a given pair of target entities. However, they do not capture the "rar- 
ity" of an explanation across different pairs of target entities. For 
example, a spousal relationship always has a count of 1, but it is ar- 
guably more interesting than a co-starring relationship with a count 
of 1. This is because co-starring relationships are much more com- 
mon than the spousal relationships. To capture such rarity informa- 
tion, we propose two distributional measures — local and global — 
that compare the aggregate measure of an explanation against the 
aggregate measures of a set of explanations obtained by varying the 
target nodes. ^ 

Let M agg be the specific aggregate measures we adopt, and 
{ai, a2, . . . , ttn} be the sequence of Magg values in increasing 
order, local and global distributions = {{a\,c\)} and = 
{(af , cf )} can be defined below, where the former is obtained by 
varying only the end target node and the latter is obtained by vary- 
ing both target nodes: 

Ci = \y e G.V I Magg{G,p,Vstart,y) = | 

4 = \{^.y) ^ {G.V X G.V) I Magg{G,p,x,y) = af\ 

Intuitively, and give the number of entity pairs whose expla- 
nations produce the aggregate values of and respectively. The 
entire distribution of these count values is then used to compute the 
rarity of the given explanation and entity pair using standard statis- 
tical techniques. In particular, we compute the position of the given 
explanation with respect to the distribution: Let A be the value of 
Magg fov thc givcu explanation and D = {(ai, ci), . . . , (a^, Cn)} 
be the distribution to be compared against, we have: 

-^^position — ^ ^ Ci 
i\ai>A 

Another alternative is to count how many standard deviations A is 
away from the mean of D, which turns out to be similarly effective 
as Aiposition- We ignore the details here due to space constraints. 

Example 7. Consider the co-starring explanation (Figure 4(b)) 
for Brad Pitt and Angelina Jolie. The corresponding count is 1 
since they co-starred in only 1 movie. The local distribution of 
counts for Brad Pitt and any other actor/actress is shown as fol- 
lows: 

= {(1,130), (2, 8), (3, 10), (4, 2)} 

Therefore the corresponding position in the local distribution is 8 + 
10 + 2 = 20. In contrast, their spousal explanation (Figure 4(a)) 
also has a count ofl. However, its position in the local distribution 
is since no other person with Brad Pitt has a larger count for a 
spousal relationship. Therefore by comparing the positions in the 
local distribution we can infer that the spousal explanation is more 
interesting than the co-starring explanation. 



^Although used in a completely different domain, aggregated mea- 
sures and distribution-based measures are analogous to the TF- 
IDF measure in IR. 



4.4 Explanation Ranking 

In this section we discuss how to efficiently rank the explanations 
given a pair of target entities. Specifically, given an interestingness 
measure and a parameter k, the explanation ranking algorithm re- 
turns a ranked list of iop-k most interesting explanations based on 
the interestingness measure. 



Algorithm 5 GeneralRankFramework(G, Vstart, Vend,n,M,k):Q 

1: Q = Gener alEnumFramework(G,v start, Verid^f^) 

2: Qint = 

3: for re G Q do 

4: Append A4(G, re.pattern, v start, Vend) to Qint 

5: end for 

6: Sort Q based on Qint 

1: Q = first k entries in Q 

8: return Q 



Algorithm 5 illustrates the general ranking framework, which in- 
volves three steps: explanation enumeration (based on Section 3), 
interestingness computation, and explanation ranking. This general 
ranking algorithm can be applied to all interestingness measures 
discussed in the previous subsections. 

For certain interestingness measures, however, we can design 
more efficient ranking algorithms: increased efficiency can be ob- 
tained by aggressively pruning explanations while interleaving the 
enumeration, interestingness computation, and ranking steps. The 
pruning for distribution based measures is described in Section 5.3.2. 
Here, we briefly describe the case of ranking based on anti- 
monotonic interestingness measures. 

Recall the anti-monotonicity property from Section 4 (which 
monocount measure satisfies); the following theorem allows us to 
prune enumerations when considering anti-monotonic measures. 

Theorem 4. Given the knowledge base G — (V^E^X) and 
target nodes v start, Vend, and anti-monotonic interestingness mea- 
sure Ai, suppose a relationship explanation re — (jp\ I') is de- 
rived from relationship explanation re = (p, /) using PathUnion- 
Basic (Algorithm 3) or PathUnionPrune (Algorithm 4). We then 
have that M{G,p,v start. Vend) > M{G,p ,v start. Vend)- (Thcrc- 
forem if re is not among the top-k most interesting explanations, no 
re derived from it is.) 

Intuitively, any expansion of an explanation can only reduce the 
value of an anti-monotonic measure. Using the theorem, we can 
integrate the three steps of the general ranking algorithm by main- 
taining a current top-/c list of most interesting explanations during 
enumeration. Upon generation of each explanation, we perform the 
following steps: 

Step 1: Calculating the interestingness of the explanation. 

Step 2: Updating the top-/c list of explanations; explanations not 
in the top-A: list are pruned out. 

Step 3: Continue expansion only from the current set of top-Zc 
explanations. 

Finally, the top-A; most interesting explanations are returned. In- 
tuitively, this algorithm is more efficient than the general ranking 
algorithm since fewer explanations are enumerated, and this intu- 
ition is supported by our experimental evaluation (Section 5). 

5. EXPERIMENTS 

We implemented the REX system in Python and performed ex- 
tensive experiments using a real world knowledge base to evaluate 
its efficiency and effectiveness. Specifically, we analyze the perfor- 
mances of explanation enumeration algorithms and ranking algo- 
rithms in Sections 5.2 and Section 5.3, respectively. We also per- 
form extensive quality assessments based on detailed user studies 
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Figure 7: Compare explanation enumeration algorithms. 

(Section 5.4) to verify the necessity of our explanation definition 
(e.g., including non-path explanations) and the effectiveness of ex- 
planations generated by REX. All experiments are performed on a 
MacBook Pro with 2.53 GHz Dual Core CPU and 4GB RAM. 

5.1 Experimental Settings 

Knowledge Base: We extracted from DBpedia (http://dbpedia.org/) 
all entertainment related entities and relationships to form our ex- 
periment knowledge base. There are a total of 20 entity types and 
2, 795 primary relationship types. Overall, the knowledge base 
contains 200K entities and over 1.3M primary relationships. 

Target Entity Pairs: We generate related entities for evaluation 
as follows: we randomly select an entity as the start entity from 
the knowledge base and then randomly select one of its related en- 
tities as suggested by the search engine^. We categorize the pairs 
based on their "connectedness", which is computed by the number 
of simple paths that connect the two entities within a given length 
limit^: low (connectedness: - 30), medium (connectedness: 30 
- 100), and high (connectedness > 100). From each of the three 
groups, we randomly pick 10 related pairs; these 30 related entity 
pairs are used for performance evaluation. 

5.2 Performance of Enumeration Algorithms 
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Figure 8: Explanation enumeration time vs. number of expla- 
nation instances. 

In this section, we compare the performance of our minimal ex- 
planation enumeration algorithms. As discussed in Section 3, there 
are 3 types of optimizations we consider: (a) using path enumera- 
tion and union framework instead of graph enumeration, (b) pick- 
ing the best path enumeration algorithm from existing solutions, (c) 
optimizing the path union algorithm. To illustrate the usefulness of 



http : //search . yahoo . com/ 

^We set the length limit to 4 to match the pattern size limit of 5 in 
our experiments. 



each optimization decision, we consider the following combina- 
tions: 1. NaiveEnum (using graph enumeration, note that graph 
enumeration cannot be used in combination with the other two 
types of optimizations), 2. PathEnumNaive^ + PathUnionBasic 

3. PathEnumBasic -\- PathUnionBasic (using path enumeration and 
union framework with baseline algorithms for both components), 

4. PathEnumPrioritized + PathUnionBasic (using prioritized path 
enumeration algorithm with basic path union algorithm), 5. Pa- 
thEnumPrioritized + PathUnionPrune (using improved path enu- 
meration and union algorithms). We set the pattern size limit to 5 
in the experiments. 

Figure 7 shows the efficiencies of different explanation enumer- 
ation algorithms. Any combination of the path enumeration and 
union algorithm, including the most naive version PathEnumNaive 
+ PathUnionBasic, shows orders of magnitude improvement over 
NaiveEnum, for all three entity pair groups (low, medium and high). 
This demonstrates the efficiency of our framework, which does not 
generate any non-minimal structure during the enumeration. The 
comparison of PathEnumBasic + PathUnionBasic and PathEnumPri- 
oritized + PathUnionBasic indicates PathEnumPrioritized is slightly 
more efficient than PathEnumBasic. (And both of them are bet- 
ter than PathEnumNaive as expected.) Although this improvement 
is not our contribution, the result tells us which is the best path 
enumeration algorithm to choose. Finally, the comparison of Pa- 
thEnumPrioritized + PathUnionBasic and PathEnumPrioritized + 
PathUnionPrune shows that PathUnionPrune is more efficient than 
PathUnionBasic due to the additional shared-component pruning 
performed during the enumeration process: on average, by using 
PathUnionPrune, it takes only one third of the time of when using 
PathUnionBasic. 

Figure 8 shows the enumeration time (using algorithm 
PathEnumPrioritized + PathUnionPrune) for all 30 entity pairs, 
where x-axis is the number of explanation instances for the pair and 
y-axis is the enumeration time. The enumeration time increases lin- 
early with the number of explanation instances between the pairs, 
which reaches as high as 5000, demonstrating the scalability of the 
REX^y^iQYCi. 

5.3 Performance of Ranking Algorithms 

In this section we evaluate the performance of ranking algo- 
rithms. The running time with ranking is affected by two com- 
ponents: the time for enumeration and the time for computing the 
measure. For simple aggregate measures such as count and mono- 
count, the enumeration time dominates. However, for distributional 
measures, measure computation takes longer (because the same 
measure needs to be computed for additional sample entity pairs). 
We show that our pruning algorithms successfully improve the per- 
formances for all measures, either through reducing enumeration 
time or measure computation time. 

5.3.1 Top-k Pruning for Anti-monotonic Measures 
Figure 9 shows the effects of top-A: {k = 10) pruning for the 



measure M 



monocount 



following the top-k pruning algorithm for 



PathEnumNaive is a most naive path enumeration algorithm: it 
enumerates all length-limited paths from start entity and checks if 
each path ends at the end entity. It is worse than any existing so- 
lution therefore we do not include it in Section 3.2 as the baseline. 
However, because it uses the most naive design without any opti- 
mization, its improvement over NaiveEnum shows the benefits of 
adopting our framework. 

^It is worth noting that density rather than the total size of the 
knowledge base affects the performance of enumeration. There- 
fore the performance would not be affected much even if we adopt 
the full DBPedia knowledge base in our experiments. 
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Figure 9: Effect of top-k (k = 10) pruning on monocount com- 
puting 
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Figure 10: Average compute time for different k in top-k prun- 
ing 

anti-monotonic measures discussed in Section 4.4. In all cases, top- 
k pruning reduces the running time to under 0.5 seconds, and it is 
sometimes several hundred times more efficient than full enumera- 
tion. In Figure 10, we examine how different values of k affect the 
running time. As expected, when k is very small, using top-k prun- 
ing significantly improves efficiency. As k becomes larger, the im- 
provement diminishes. When k is very large, the pruning algorithm 
is close to (and in the medium group slower than) the non-pruning 
algorithm, since very few results are pruned and maintaining the 
top-A: list adds overhead. 

5.3.2 Computing and Pruning for Distribution-Based 
Measures 
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Figure 11: Average time for computing top- 10 explanation us- 
ing distribution-based measure M^osiuon* 

Despite the fact that distribution-based measures as described in 
Section 4.3 are not anti-monotonic and therefore not subject to the 
aggressive pruning introduced in Section 4.4, we can potentially 



optimize their computation by integrating the measure computation 
with explanation ranking. Here, we use the local distribution-based 
position measure to illustrate how the pruning can be done. 

Specifically, given a pair of target nodes v start and Vend, the 
knowledge base G with all the primary relationships stored in a 
relational table R{eidl, eid2,reiy^ , an explanation pattern re, 
and its M count c, the local distributional position of re based on 
Mcount can be computed via evaluating a SQL query describing 
re's pattern. Assuming re is the co-starring relationship, the cor- 
responding SQL statement is as follows: 

SELECT v_start, R2.eidl, count (^) as count 
FROM R as Rl, R as R2 

WHERE v_start = Rl.eidl AND Rl.eid2 = R2.eid2 

AND Rl.rel = ''starring' 

AND R2.rel = ''starring' 
GROUP BY v_start, R2.eidl 
HAVING count > c 

The structure of the explanation pattern is encoded in the "FROM" 
and "WHERE" clauses (e.g., each edge would be mapped to a table 
in the "FROM" clause). Each returned record represents a pair of 
entities (within the local distribution) that have count greater than 
the target entity pair. Therefore, the number of records in the SQL 
statement gives the desired position of the explanation. 

To improve upon the general brute force Algorithm 5, we main- 
tain a top-A; list of explanations when computing the interestingness 
of the explanations and modify the SQL query above for optimiza- 
tion. For example, if we know the current k^^ most interesting 
explanation has a position of p, then we needn't compute the po- 
sition for target entities whose position is guaranteed to be above 
p. This optimization can be reflected by simply adding a LIMIT p 
clause in the SQL query above. 

We implemented this pruning strategy and evaluated its effec- 
tiveness for iop-k (k = 10) explanation ranking using distribution- 
based measure Mposition- There are four different scenarios: lo- 
cal distribution, local distribution with pruning, global distribution, 
and global distribution with pruning. Since the true global distri- 
bution would be prohibitively time-consuming to compute, we use 
100 local distributions to estimate the global distribution, with each 
local distribution associated with randomly chosen start entities. 
The computation time in all four scenarios are shown in Figure 1 1 . 
First, we note that pruning is beneficial regardless whether the mea- 
sure is local or global distribution based. In particular, pruning can 
speed up the computation by 2 times for local distributional mea- 
sures. However, ranking using global distributional measure is still 
quite costly even with pruning. We note that the cost of comput- 
ing distributional measures can be further decreased by amortizing 
the computation over different pairs by sharing the computation in- 
volved. Also, distributional measures can be computed in parallel 
as count for different node pairs can be computed separately. Fi- 
nally, combination of distributional measures with other measures 
could decrease the computation time. For example, we can use 
some other measure (e.g., size) as the primary comparison index 
and use distributional measures only to tie-break the less expensive 
primary index comparison. Our experiments show that in average 
computation time based on such combinational measures are sev- 
eral times faster than using distributional measures alone. 

5.4 Measure Effectiveness 

In this section, we analyze the effectiveness of explanations gen- 
erated by REX. In Section 5.4.1, we compare the relative effec- 
tiveness of different interesting measures and their combinations. 

^*^The knowledge base can be stored using other data models (e.g, 
RDF), and the same computing strategy can still be applied. 
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Measure 


PI 


P2 


P3 


P4 


P5 


Avg 


size 


50 


51 


33 


51 


52 


47 


random-walk 


55 


45 


41 


45 


47 


47 


count 


53 


39 


38 


53 


45 


46 


monocount 


54 


40 


40 


52 


41 


45 


local-dist 


62 


47 


53 


58 


59 


55 


global-dist 


61 


37 


58 


61 


58 


55 


size + monocount 


67 


60 


50 


61 


59 


59 


size + local-dist 


67 


60 


50 


62 


60 


60 



Table 1: Comparing different interestingness measures. 



In Section 5.4.2 we show why only using path is not sufficient to 
model all possible interesting explanations. 

5.4.1 Effectiveness of Interestingness Measures 

We compare the 6 measures discussed in Section 4: size (Msize), 
random walk (Mwaik), count (Mcount), monocount (M mono- 
count), position in local and global distributions (M'^posiUon^ 
'^position)- ^^^^ expect that combinations of different mea- 
sures, especially combinations of structure based measures (e.g., 
Msize) with aggregated and distributional measures (e.g., Mcount, 

MmonocounU M^po^tion. ^^pofi^Ln)' ^OUld be VCry hclpful SinCC 

they try to capture the interestingness of explanations from differ- 
ent while complementary directions. Therefore, we also include 
some combinational measures in the result to verify the idea. 

We randomly selected 5 entity pairs for this study PI: (brad 
pin, angelina jolie), P2: (kate winslet, leonardo dicaprio), P3: 
(torn cruise, will smith), P4: (james cameron, kate winslet), P5: 
(mel gibson, helen hunt). For each pair, each measure is used to 
rank the top- 10 most interesting explanations. The resulting expla- 
nations are randomized and mixed together so the user can't tell 
how an explanation is measured by each measure. The user is then 
asked to label each explanation as very relevant (score 2), some- 
what relevant (1), or not relevant (0). For each ranking methodol- 
ogy, a DCG-style score is computed as follows: 

score{M) = mT,i{wi x Si),i G [1, 10] 

where m is a normalization factor to ensure the scores fall within 
[0, 100], Wi are the weights given for each rank position (in our 
case, Wi = 1/ log2(i +1))^^, and Si are the individual explanation 
scores at position i as ranked by the corresponding measure. 

A total of 10 users responded to our user study. The average 
scores of different measures for each entity pair are shown in first 
6 lines in Table 1. The effectiveness of Msize, Mwaik, M count 
and Mmonocount vcry similar. (The most simple size mea- 
sure is even slightly better.) As we expected, the two distribution- 
based measures are statistically better than the simple aggregate 
measures and structure based measures. It is interesting to see 
that, despite its much more limited sampling scope, M^'po^uon P^^- 
forms as well as M. position terms of ranking quality. Given 
that M^poMon is much cheaper to compute (Figure 11), we rec- 
ommend that M\^osition be always used in place of M position if 
distribution-based measures are desired. 

We also consider two very simple combinations of the measures: 
Msizegcmonocount (using Msize as the primary comparison in- 

During the selection, we removed any pairs with at least one node 
not recognized by the authors to ensure respondents can easily 
judge the correctness and interestingness of the explanations. 

^^Discounted cumulative gain is a frequently used ranking measure 
in web search [16]. 

^^The effects of the exact weight values do not change our results 
much as long as the relative orders are maintained. 



dex and use Mmonocount as thc secondary comparison index), 
MsizeMocai-dist (usiug Msize as thc primary comparison index 
and use Miocai-dist as the secondary comparison index). Intu- 
itively, we expect these two measures are much better than size 
measure alone since size measure is too coarse-grained to distin- 
guish all interesting explanations. The results of the combinations 
are show in line 7 - 8 of Table 1 . It turns out that their combina- 
tions are better than any individual interesting measures. It is worth 
pointing out that these are two very preliminary combinations, and 
we can definitely further improve the combinations using machine 
learning techniques. While we believe the current results are suf- 
ficient to demonstrate the idea and we leave the detailed study as 
future work. 

Summary: When restricted to individual measures, distribu- 
tional measures achieves the best effectiveness. The combination of 
structure based measures (e.g., size) with aggregated and distribution- 
based measures provide better ranking results than any individual 
measures. To achieve best effectiveness, machine learning algo- 
rithms can be used to train best combination of all measures; when 
efficiency is also a concern, we can restrict the combination on anti- 
monotonic measures (e.g., Msize, Mmonocount), which will still 
achieve reasonable effectiveness while can be computed efficiently. 

5.4.2 Comparing Path and Non-Path Explanations 

Based on the user study of previous section, for each target en- 
tities pairs, we can pick up to 10 most interesting explanations^"^ 
based on user judgment. Among all top-5 explanations, only 36% 
of them are paths (64% are non-paths); among all top- 10 explana- 
tions, 38% of them are paths. The results demonstrate of necessity 
of including non-paths in the explanation definition. 

6. RELATED WORK 

There are a few recent studies on discovering relationships be- 
tween various web artifacts. E.g., [20] connects two search terms 
by extracting pairs of pages based on their common search results; 
[23] extracts a chain of news articles that connect two news articles 
based on shared words. Our work is complementary to these as 
we study entities specifically and leverage a rich knowledge base 
and a comprehensive set of interestingness measures based on both 
aggregates and distributions. 

Our work is related to the vast literature on keyword search in 
relational and semi- structured databases [1,2,3,5, 17, 12, 13, 14, 
21, 24, 29, 15]. The two major distinctions between REX and these 
works are: (1) We consider connection structures that are more 
complex than trees and paths for explaining two entities; (2) We 
introduce two novel families of pattern level interestingness mea- 
sures. 

Our path (instance and pattern) enumeration component can be 
viewed as a special case of keyword search in databases, where in- 
put keywords match exactly two entities. Therefore we can directly 
adapt algorithms from these works. The first algorithm PathEnum- 
Basic is adapted from BANKS [5], which does concurrent shortest 
path run from each target node. The same intuition also comes 
from Discover [14] if we are considering pattern level search. The 
restriction of "small" relation and the evaluation ordering based on 
candidate network sharing "frequency" leads us to a very similar 
solution in our problem settings. The second path enumeration al- 
gorithm PathEnumPrioritized with node activation score is adapted 
from BANKS2 [17]. If we consider pattern level enumeration, the 
same intuition can also come from Discover [14] when we assume 



We also require the average score of an explanation to be at least 
1 to avoid include uninteresting explanations 
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b > in the cost model (i.e., considering the estimated size of join 
results) when prioritizing the candidate network evaluation. 

We emphasize that path enumeration algorithms are not our pri- 
mary contribution and our framework is flexible enough to take ad- 
vantage any state-of-art keyword search or path enumeration algo- 
rithms. Other related work directly dealing with path enumeration 
can be found in [28, 19, 7], although they either work in slightly 
different problem settings or provide similar intuitions as discussed 
above. 

A lot of keyword search papers also discuss ranking based on 
various interestingness measures. Most of the papers focus on the 
interestingness at the instance level. Usually, size of the connecting 
structure is used as the basic metrics. Other enhancements include 
taking into consideration edge weights [5, 17], node weights [3] 
and keyword to structure mapping scores [13, 21] inspired by IR 
techniques. The interestingness measures we proposed are orthog- 
onal to these instance level interestingness measures. We capture 
the pattern level interestingness by properly aggregating (e.g., count 
based measures) and normalizing (distributional measures) the in- 
stance level measures. Indeed, some work has also considered 
pattern-level interestingness [24, 29]. However, their problem set- 
tings are different: They assume the user of the system to be a do- 
main expert or have a clear search intension (although lack knowl- 
edge of the schema or format of data sources). Therefore, these 
works mainly rely on user feedback to refine and discover the best 
queries. 

There are also quite a few papers on graph mining that mine con- 
necting structures between a set of nodes [8, 10, 18, 22, 25]. How- 
ever, these algorithms only return a single large connection graph 
containing a lot of interesting facts, without distilling individual ex- 
planations from the remaining part of the connection graph. REX, 
other the other hand, finds multiple interesting explanations and 
ranks them to describe different aspects of a relationship. 

Our work is also closely related to various studies in the frequent 
graph mining literature. In particular, [9, 26, 27] describes efficient 
algorithms for identifying frequent sub-graphs from a database of 
many graphs. While our pruning techniques for anti-monotonic 
measures are inspired by these algorithms, our problem setting is 
fundamentally different from their transactional setting: we are 
mining interesting patterns from a single large graph (i.e., the knowl- 
edge base) instead of a database of (relatively) small graphs. More 
recently, [6] studies the notion of pattern frequency in a single 
graph setting and proposes the notion of monocount as the mini- 
mum number of distinct nodes in the original graph that any node 
in the pattern maps to. Our Mmonocount is an extension of this 
notion. It is worth noting that none of those prior works study 
distribution-based measures for interestingness. 

7. CONCLUSION 

Given the increasing importance of features like "related searches" 
on major search engines particularly for entity searches, it is desir- 
able to explain to the users why a given pair of entities are related. 
And, as far as we know, our work is the first to propose this relation- 
ship explanation problem. Furthermore, we studied the desirable 
properties of relationship explanations given a knowledge base, 
and formalized both aggregate -based and distribution-based inter- 
estingness measures for ranking explanations. The overall problem 
was decomposed into two sub-problems: explanation enumeration 
and explanation ranking; we designed and implemented efficient 
and scalable algorithms for solving both sub-problems. Extensive 
experiments with real data show that REX discovers high quality 
explanations efficiently over a real world knowledge base. 
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